Energy-efficient algebra kernels in FPGA for High Performance Computing

نویسندگان

چکیده

The dissemination of multi-core architectures and the later irruption massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms last decades. As result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as versatile more energy-efficient alternative other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such VHDL or Verilog, which follow an entirely different programming model than standard software languages, their use requires specialized knowledge underlying hardware. In years, manufacturers started make big efforts provide High-Level Synthesis (HLS) tools, order allow grater adoption FPGAs HPC community.Our work studies hardware address Numerical Linear Algebra (NLA) kernels general matrix multiplication GEMM sparse matrix-vector SpMV. Specifically, we compare behavior fine-tuned CPU processor HLS implementations on FPGAs. We perform experimental evaluation our low-end cutting-edge platform, terms runtime energy consumption, results against Intel MKL library CPU.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Performance Linear Algebra Processor using FPGA

With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing. FPGA provides programmable hardware that can be used to design custom hardware without the high-cost of traditional hardware design. In this talk we discuss two multi-proc...

متن کامل

PAM-Blox: High Performance FPGA Design for Adaptive Computing

PAM-Blox are object-oriented circuit generators on top of the PCI Pamette design environment, PamDC. Highperformance FPGA design for adaptive computing is simplified by using a hierarchy of optimized hardware objects described in C++. PAM-Blox consist of two major layers of abstraction. First, PamBlox are parameterizable simple elements such as counters and adders. Automatic placement of carry ...

متن کامل

IANUS: an FPGA-based System for High Performance Scientific Computing

This paper describes IANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each IANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the IANUS host, a conventional PC. IANUS is tailored for, but not...

متن کامل

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

The high performance of FPGA (Field Programmable Gate Array) in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC) paradigm has been found to be significantly advantageous in certain application domains including image processing because of i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of computer science and technology

سال: 2021

ISSN: ['1666-6046', '1666-6038']

DOI: https://doi.org/10.24215/16666038.21.e09